Picture for Rogerio Feris

Rogerio Feris

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Add code
May 31, 2026
Viaarxiv icon

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Add code
May 18, 2026
Viaarxiv icon

TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning

Add code
Apr 01, 2026
Viaarxiv icon

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

Add code
Mar 28, 2026
Viaarxiv icon

CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models

Add code
Feb 06, 2026
Viaarxiv icon

Latent Implicit Visual Reasoning

Add code
Dec 24, 2025
Viaarxiv icon

DAVE: A VLM Vision Encoder for Document Understanding and Web Agents

Add code
Dec 19, 2025
Viaarxiv icon

Activation Reward Models for Few-Shot Model Alignment

Add code
Jul 02, 2025
Viaarxiv icon

Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion

Add code
May 23, 2025
Viaarxiv icon

Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?

Add code
May 14, 2025
Figure 1 for Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Figure 2 for Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Viaarxiv icon